Hand pose estimation through semi-supervised and weakly-supervised learning
نویسندگان
چکیده
We propose a method for hand pose estimation based on a deep regressor trained on two different kinds of input. Raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. This intermediate representation contains important topological information and provides useful cues for reasoning about joint locations. The mapping from raw depth to segmentation maps is learned in a semiand weakly-supervised way from two different datasets: (i) a synthetic dataset created through a rendering pipeline including densely labeled ground truth (pixelwise segmentations); and (ii) a dataset with real images for which ground truth joint positions are available, but not dense segmentations. Loss for training on real images is generated from a patch-wise restoration process, which aligns tentative segmentation maps with a large dictionary of synthetic poses. The underlying premise is that the domain shift between synthetic and real data is smaller in the intermediate representation, where labels carry geometric and topological meaning, than in the raw input domain. Experiments on the NYU dataset [1] show that the proposed training method decreases error on joints over direct regression of joints from depth data by 15.7%.
منابع مشابه
Hand Pose Estimation through Weakly-Supervised Learning of a Rich Intermediate Representation
We propose a method for hand pose estimation based on a deep regressor trained on two different kinds of input. Raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. This intermediate representation contains important topological information and provides useful cues for reasoning about joint locations. The mapping from raw depth to seg...
متن کاملStructure-Aware and Temporally Coherent 3D Human Pose Estimation
Deep learning methods for 3D human pose estimation from RGB images require a huge amount of domain-specific labeled data for good in-the-wild performance. However, obtaining annotated 3D pose data requires a complex motion capture setup which is generally limited to controlled settings. We propose a semi-supervised learning method using a structure-aware loss function which is able to utilize a...
متن کاملConditional Models for 3d Human Pose Estimation
OF THE DISSERTATION Conditional Models for 3D Human Pose Estimation by ATUL KANAUJIA Dissertation Director: Dimitris Metaxas Human 3d pose estimation from monocular sequence is a challenging problem, owing to highly articulated structure of human body, varied anthropometry, self occlusion, depth ambiguities and large variability in the appearance and background in which humans may appear. Conve...
متن کاملUnsupervised Geometry-Aware Representation for 3D Human Pose Estimation
Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weaklysupervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed. In this paper, we propose to overcome this problem by learning a g...
متن کاملSemi-supervised Facial Expression Recognition Algorithm on The Condition of Multi-pose
A major challenge in pattern recognition is labeling of large numbers of samples. This problem has been solved by extending supervised learning to semi-supervised learning. Thus semi-supervised learning has become one of the most important methods on the research of facial expression recognition. Frontal and un-occluded face images have been well recognized using traditional facial expression r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Vision and Image Understanding
دوره 164 شماره
صفحات -
تاریخ انتشار 2017